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1. Introduction 

In broadest terms, the method of averaging (or "averaging principle") may be described as follows: to approx- 
imate the evolution of a system with motions occurring on both fast and slow timescales, one uses a simpler 
system obtained by somehow averaging over the fast motion of the original system. In the context of difference 
equations (or "iterated maps"), the most elementary situation to which the method applies occurs in periodic 
systems of the form 

x n+ i = x n + ef(x n ,n) (1.1) 

where x n G U C R d , n G N, e > is a small parameter, and / : U x N — > R d is a bounded, locally x-Lipschitz, 
discrete-time-dependent function of period p in n. Solutions of system (1.1) are approximated by solutions of 
the associated averaged system 

Vn+l = Vn + efiVn) (1.2) 

where the autonomous function / : U — > R d (the average of /) is given by f(y) — (1/p) X)n=o f(Vi n )- ^ n 
this context the averaging principle asserts that solutions x n of Eq. (1.1) and y n of Eq. (1.2) that start at the 
same initial condition remain 0(e)-close on a discrete timescale of 0(1/ e). It is also often useful to use the 
continuous-time solutions of the corresponding averaged ODE 

ft = £?{V) (L3) 

to approximate the discrete-time solutions of Eq. (1.2) and hence also those of Eq. (1.1), so that we obtain 
the two approximation relations x n = y n + 0(e) and x n — y(n) + 0(e) for < n < 0(1/ e) (note that y n and 
y(n) have different meanings). A more precise formulation appears below in Theorem 1, followed by a very 
elementary proof that makes no use of the usual transformation that appears in textbooks (it is not always 
recognized that first-order averaging may be justified without the sort of coordinate transformations used, for 
example, in canonical perturbation theory). 

Equation (1.1) is a special case of a more general problem on which we focus in this paper. Let v G R, 
U C R d , and / : U x R — > R d be periodic with period 1 in its second argument. We then consider the system 

x n+ i = x n + ef(x n , nv) . (1.4) 

The analysis of this problem is similar to the analysis of the flow problem dx/dt — sf(x, t) when / is quasiperiodic 
in t with two base frequencies, since small divisors enter both problems in the same way. Clearly Eq. (1.4) reduces 
to Eq. (1.1) when v = q/p is rational. For v irrational, we know from Weyl's equidistribution theorem [K6] 
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that the average of f(x, nv) over n exists and equals f(x) — L f(x, t) dt. It is therefore natural to ask for what 
values of v the solutions of Eq. (1.4) can be approximated by solutions of the two systems 

Vn+l = Vn + £f{y n ) (1-5) 

and 

|-.7to). (is) 

In answering this question, it also seems natural (from the mathematical viewpoint) to introduce Diophantine 
conditions on v, but these conditions in their usual form are problematic in applications, and not wholly 
necessary, as we shall see. In fact, we present approximation theorems that are both theoretically satisfying 
and suited to applications. In particular, we weaken the usual small divisor conditions on v (in which v satisfies 
infinitely many "Diophantine conditions" ) , requiring instead only finitely many conditions at appropriately low 
order. These conditions exclude v from zones centered on low-order rationals, and in this "far-from-low-order- 
resonance case" (where v satisfies only "truncated Diophantine conditions" and is not necessarily irrational), 
we again find that x n = y n + O(e) = y(n) + O(e) for < n < 0(1/ e) (see Theorem 2 below). Under the 
additional hypothesis that the average of the perturbation vanishes, we are able to show adiabatic invariance of 
solutions of system (1.4) on extended timescales up to 0(l/e 2 ) (see Theorem 3). We thus have results for both 
low-order resonant (or rational) v, and for v far from low-order resonance. 

Finally, a simple trick permits us to explore 0(e) neighborhoods of low-order resonances v — q/p: we set 
v = q/p + ea (where a e R should be viewed as a measure of the 0(e) displacement from the resonance) and 
rewrite Eq. (1.4) as the system 

x n+ i \ _ ( x n + ef(x n , | n + t„) \ ^ ^ 



T n +i I \ r n + ea 

This is in the form of Eq. (1.1) with x n replaced by (x n , r„) T . Writing f(x, r) = 1/p X)«=o nq/p + t), the 
averaged problem reduces to 



Vn+l \ _ f Vn + £ f(Un,T n ) 

T n +i J \ r n + ea 



(1.8) 



and we recapture the relations x n = y n + O(e) = y(n) + 0(e) for < n < 0(l/e), where y(t) is the solution 
of the system 

L(v\- e (?<»,T)\, (1.9) 



dt \t J \ a 

which is equivalent to the non-autonomous system dy/dt = ef(y,eat) ; see Proposition C below. 

Initially, we state Theorems 1,2, and 3 under the hypothesis that the perturbation ef has compact support 
in its x-domain, which is assumed to be all of R d ; this avoids a priori restrictions on e and permits clear proofs. 
To obtain results better suited to applications, we then give propositions that extend our theorems to more 
general perturbations on more general domains, and also to more general Diophantine conditions in which 
the zones mentioned above are allowed to depend on e; this in turn allows v to come within 0(e x ) of low- 
order rationals, but with loss of accuracy in the approximation (see Propositions A and B below). Using the 
generalized versions of our theorems (provided by Propositions A, B, and C), we obtain an essentially complete 
description of solutions of system (1.4) on 0(1/ e) timescales for various values of v (there are however thin gaps 
at the boundaries between the v for which resonant and nonresonant motions occur; cf. Remark 2.5 below). 

From the viewpoint of applied mathematics, perhaps the most interesting aspect of our results is that our 
Theorems 2 and 3 have physically realistic, truncated Diophantine conditions in their hypotheses, yet provide 
approximations valid on full 0(1/ e) time intervals. For more general multiphase averaging principles, such nice 
hypotheses lead to passage through resonance, and thus to approximations that are valid only on somewhat 
shorter time intervals (cf. [ABG]); but we have identified an important class of simpler problems arising from 
accelerator beam dynamics in which both the realistic hypotheses and the full 0(1 /e) validity times can coexist. 

More generally, averaging principles for maps are not new; results in this direction have been available 
since the 1960s (cf. for example [Bel], [Dr]). However, a detailed theory of Eq. (1.4) suitable for applications 
appears to be missing from the literature, and we proceed to fill that gap in this paper. We do not however 
illustrate the full range of applicability of our theorems; instead we discuss a single important example from the 
class of problems which motivated this investigation, namely the so-called "kick-rotate" models from accelerator 
dynamics, represented by 

w n+ i = M(w n + eK(w n )), 
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which takes the form of Eq. (1.4) under the transformation w n = M n x n . In this paper, we emphasize this model's 
application to the so-called weak-strong beam-beam interaction (see §3.2 below), but kick-rotate models also 
apply to other localized perturbations in accelerators. 

We point out that our discussion below in Section 3 is the first mathematically rigorous treatment of this 
important class of models in the sense of asymptotics. Many beam dynamics treatments start with a smooth 
Hamiltonian formulation and apply canonical perturbation theory without rigorous error analysis. Resonances 
are often not treated in the spirit of perturbation theory (see however the paper [Ru] for a nice discussion of 
the use of perturbation theory in beam dynamics). Futhermore, delta function perturbations are often used in 
this smooth Hamiltonian framework (it is of course more natural to use them with maps), making the validity 
of any resulting approximations hard to assess. (The paper [CBW] gives a nice introduction to the beam-beam 
interaction, but uses this Hamiltonian/delta function approach.) One notable exception to the Hamiltonian 
formulation is the work on maps using Lie operators, a good discussion of which may be found in [Fo], where 
the author has carried this approach quite far — to realistic machine models — but without focusing on rigorous 
asymptotics. We are aware of another research group working on highly mathematical perturbation treatments 
of beam dynamics in the context of maps [BGSTT] but our work here is quite distinct from theirs. To begin 
with, our perturbation parameter is the size of the "kick" (cf. Section 3.1 below), whereas they study the long 
time stability of the origin (which is assumed to be a linearly stable elliptic fixed point), using the distance 
from the origin as a perturbation parameter. Futhermore, their analysis is quite complex, as they pursue 
Nckhoroshev-type results involving many successive coordinate transformations which give rise to complicated 
and restrictive hypotheses that may be difficult to verify in practice. In our own approach, resonances are treated 
in the simplest possible rigorous way, and we obtain a natural partition of "tune space" into regions with distinct 
resonance properties. We believe this is an important new feature, both conceptually and practically. Of course, 
it is important to note that our method gives approximations to leading order only (using no transformations, 
as mentioned earlier); this accounts for much of its radical simplicity. It also allows us to use simple and 
realistic hypotheses, in turn permitting meaningful comparison of the kick-rotate approximation with numerical 
experiments. Overall, we believe that our treatment provides the starting point for a simple, effective means 
of studying mathematical models of beam dynamics rigorously, and that its development should complement 
previous theoretical and mathematical work. 

The remainder of this paper is organized as follows. In Section 2 we present the details of our averaging 
results described informally above. In Section 3 we apply the averaging principles to model problems in ac- 
celerator beam dynamics, showing that solutions of a class of "kick-rotate" models are well-approximated by 
solutions of the corresponding averaged models. We also apply the adiabatic invariance principle to the Hcnon 
map (often used to model sextupole magnets in accelerators). In Section 4, we formulate the main technical 
tools required to prove the results in Section 2. These are the so-called Besjes inequality for periodic functions 
(Lemma 1, §4.1), and its generalization to functions far from low-order resonance (Lemma 2, §4.2.2). After 
formulating and proving these inequalities, we use them to prove the mathematical results from Section 2. 
Finally, for the sake of completeness, in the Appendix we state and prove two elementary results used in earlier 
proofs. 

We end this introduction with a few words about notation. We use the symbols N, R, R + , and Z to 
denote, respectively, the counting numbers {0, 1,2,.. .}, the real numbers, the positive real numbers, and the 
integers. The symbol | | indicates the Euclidean norm on R d (or the absolute value |fc| of an integer k), and 
||s denotes the uniform norm of a function over the set 5; i.e., \\F\\s := sup xeS |-F(a;)|. 

2. Averaging Principles and Adiabatic Invariance 

In this section we state — and provide brief remarks on — our approximation results for maps as discussed in the 
introduction above. 

2.1 Averaging for Maps with Periodic Perturbations 

Let us be more precise about the functions / in Eq. (1.1) to which our results apply. First, taking 
S = R d x N, we assume that / : S — > R satisfies the following: 

(i) / is bounded on S and f(-,n) is locally Lipschitz, uniformly in n 

(ii) There exists a positive integer p such that (a;, n)eS=> f(x, n + p) = f(x, n) 

(iii) There is an r > such that |x| > r and n € N =>■ f(x, n) — 

When / satisfies (ii), we say it is "periodic with period p in its second argument"; and when it satisfies (iii), 
it is "compactly supported in x, uniformly in n." It follows from (i) and (iii) that / is globally Lipschitz in x, 
uniformly in n. In Subsection 2.4 we show how to treat the case where / is not compactly supported. 



3 



Averaging for Maps 



We now state a simple averaging principle for maps with periodic perturbation ef(x, n) and corresponding 
averaged perturbation ef(y) = (e/p) YZ=o /(f> n ) '■ 

Theorem 1. Let S = R d x N, and suppose f : S — > R d satisfies assumptions (i), (ii), and (in) above. Fix 
e € (0,1], and consider the system 

x n +i = x n + ef(x n ,n) (1.1) 
together with the associated averaged systems 

JM+i = ZM + e/(y«) (1-2) , and — = ef(y) . (1.3) 

Choose T > to capture the desired properties of system (1.3) on [0, Tje\. Then there exist positive constants 
C = C{T) and C = C'{T) such that the solutions x n , y n , and y(t) of Eqs. (1.1), (1.2), and (1.3) with common 
initial condition xq = yo = y(0) exist uniquely for all time and satisfy \x n — y n \ < Cpe and \x n — y(n)\ < 
(Cp + C')e for0<n< T/e. 

2.2 Averaging for Maps With Perturbations Far From Low-Order Resonance 

We now present an averaging principle for system (1.4), where v is a fixed positive number. When we 
write v = q/p, we mean that q and p > arc relatively prime integers with the order of the rational number 
v given by p > 0. Using this convention, we first note that if v = q/p, then f(x,nv) has integer period p in 
n, and Theorem 1 applies. In fact, as we shall see in Proposition C, Theorem 1 applies not only at low-order 
rationals but also near them. However, since the error estimate in this theorem is proportional to p, it is not 
very useful when p is "large." We therefore restrict use of Theorem 1 to situations where p is "small" (the 
"low-order-resonance case"), and we next focus on situations where v is far from low-order rational numbers 
(the "far-from-low-order-resonance case"). In this case small divisors inevitably enter the analysis (see the 
proof of Lemma 2, §4.2.2) and it might be expected that v would need to be "highly irrational" (e.g. satisfy 
infinitely many Diophantinc conditions). We show instead that the averaging principle may be established when 
v satisfies only finitely many Diophantinc conditions to a certain order, and we call these truncated Diophantinc 
conditions. 

In more precise terms, v satisfies truncated Diophantine conditions if it belongs to the set £>(</>, R) defined 
below in Eq. (4.3), where <j> is the zone function of the Diophantine condition and R > is the truncation 
order or ultraviolet cutoff, which gives precise meaning to the phrase u p large" used above (i.e., p is large if 
p> R). Roughly speaking, V(cf),R) is constructed by removing open intervals centered on low-order rationals 
v = q/p. The zone function <j> controls the size of the intervals removed, and the cutoff R is the maximal order 
of rationals from around which intervals are removed. These terms are defined precisely in Subsection 4.2.1 (to 
fully understand the difference between truncated and ordinary Diophantinc conditions, and to appreciate the 
advantages offered by the former, the reader may find it worthwhile to read that subsection). 

With truncated Diophantine conditions given explicitly in Eq. (4.3), we now consider the class of functions 
to which our next result applies. For S = R d x R wc consider functions / : S — ► R d satisfying the following 
conditions (analogous to (i) through (iii) in §2.1): 

(j) f is of class C 4 on S 

(jj) (x,e)es=>f(x,e + i) = f(x,e) 

(jjj) There is an r > such that \x\ > r and SeR^ f(x, 8) = 

Terminology for describing conditions (jj) and (jjj) is similar to that for describing conditions (ii) and (iii) 
above in Subsection 2.1. Since we assume / has unit period in its second argument, its average / is simply 
f(y) := Jo f{Vi 0) d&- Finally, wc alert the reader that the truncated Diophantine conditions satisfied by v must 
be adapted to / in the sense that the zone function <j) must decay appropriately; this is made precise in Eq. 
(4.2) of Subsection 4.2.1 (basically <j> must decay fast enough so that V((j),R) is nonempty, but slow enough so 
that the scries in Eq. (4.2) converges; this accounts for assumption (j) above and our specific choice of <f) as 
discussed in §4.2.1). 
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We now state our averaging principle for maps with perturbations ef(x,nv) far from low-order resonance 
and averaged perturbation ef(y) as above: 

Theorem 2. Let S = R d x R, suppose f : S — > R d satisfies assumptions (j), (jj), and (jjj) above, and suppose 
the zone function <fi is adapted to f on R d in the sense of Eq. (4.2). Fix e € (0, 1], and consider the system 

x n +i = x n + ef(x n ,nu) (1.4) 

together with the associated averaged systems 

Vn+i = Vn + ef{yn) (1.5), and — = sf(y) . (1.6) 

Choose T > to capture the desired properties of system (1.6) on [0, T/e]. Then there exist positive constants 
R e , C = C(f, <j>, T), and C = C'(f, (j), T) such that whenever v € V(</>, R e ) (defined in Eq. (4.3)), the solutions 
x n , y n , and y(t) of Eqs. (1.4), (1.5), and (1.6) with common initial condition x = yo = 2/(0) exist uniquely for 
all time and satisfy \x n — y n \ < C e and \x n — y(n)\ < C e for < n < T/e. 

Remark 2.1 For averaging principles of this type, it is natural to consider the average 

lirn/v^oo (1/N) J2n=o f( x > nv ) °^ / over 71 as mentioned in the introduction. Under mild integrability conditions 
on /, it can be shown that when v is irrational, this average converges to J Q f(x, 6) dO, which is the average 
used here (this is related to Weyl's equidistribution theorem; cf. [Br] and [K6]). However, our results do not 
require the existence of the average of f(x, nv) over n, nor do they require v to be irrational; instead we require 
v e T>((j>, R s ), and this latter set contains many rationals of order greater than R £ . 

2.3 Adiabatic Invariance on Extended Timescales 

In this subsection, we consider a special system somewhat like a perturbation of an integrable Hamiltonian 
system. As in Theorem 2, we assume that v satisfies truncated Diophantinc conditions, but now we assume 
additionally that the perturbation ef has zero mean; i.e., we assume that 

(jw) For each x e R d , [ f(x, 6) d9 = 
Jo 

This extra hypothesis gives an averaging principle showing that the action-like variables are adiabatically 
invariant over timescales longer than 0(1/ e): 

Theorem 3. Let S = R d x R, suppose /:<?—> R d satisfies conditions (j), (jj), (jjj), and (jw) above, and 
suppose the zone function <f> is adapted to f on R d (as in Eq. (4.2)). Fix e € (0, 1], choose T > 0, and consider 
the system 

x n +i = x n +ef(x n ,nv) (1.4) 

with arbitrary initial condition xq G R d . Then there exist positive constants R e , K\ = Ki(f,<j>), and K2 = 
K2(f,4>) sucn that whenever v e T>(<fi,R s ) (cf. Eq. (4.3)), the solution x n of Eq. (1.4) satisfies \x n — x \ < 
K1S + K2 e 2 n for n e N. In particular, for < a < 1, we have \x n - x \ < C(T) e a for < n < T/e 2 ~ a , where 
C(T) =K 1 + K 2 T. 

Remark 2.2 Using second (or higher) order averaging, it is possible to get a better estimate of \x n — x \ on 
the full 0(1/ e 2 ) time interval (see [ES] for a flow version). 

2.4 Extensions and Generalizations 

In this subsection we give three propositions that extend and generalize our results above, making them 
more suitable for applications. Our first proposition shows that Theorems 2 and 3 may be generalized to the 
case where the zones of the truncated Diophantine conditions depend on e. 

Proposition A (e-dependent zone functions). Suppose that < A < 1, and that in Theorem 2 [or Theorem 
3], the zone function (j> is replaced by the new zone function e x cf>. Then the conclusions of the theorem remain 
true, provided that the error estimates Ce and C'e are modified to read Ce 1 " A and C's 1 ~ x [or C(T)e a is 
modiRed to read C(T) e"- x ]. 

In order to clarify and simplify the mathematical structure of our methods, we have presented Theorems 
1,2, and 3 under the assumption that the perturbations have compact support on spatial domains that are all 
of R d . Our next proposition shows that this assumption may be removed at little cost. 
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Proposition B (more general perturbations). Suppose that the domain S = R d x N in Theorem 1 is 
replaced by the more general domain S' = U x N, where U C R d is open [or the domain S — R d x R in 
Theorem 2 or 3 is replaced by S' = U x R, U C R d open], and assumption (Hi) is removed from the hypotheses 
of Theorem 1 [or (jjj) is removed from the hypotheses of Theorem 2 or 3]. Then the conclusions of Theorem 1 
[or Theorem 2 or 3] remain true provided that: (a) < e < e , where the threshold e > may be estimated 
as outlined below in Subsection 4.3.2; and (b) the conclusion "exist uniquely for all time" is replaced by "exist 
uniquely on the time interval [0, T/e]," with T > chosen strictly less than (3(xq), where [0, (3(xo)) is the 
maximal forward interval of existence for the averaged How problem dy/dt' = f(y) in the domain U [or for the 
How problem dy/dt' — f(y) in U]. 

Remark 2.3 Of course Proposition A also applies to Proposition B. 

The following proposition shows that Theorem 1 may be used to analyze the dynamics of solutions of Eq. 
(1.4) in 0(e) neighborhoods of low-order resonances v = q/p. 

Proposition C (behavior near low-order resonance). Let U C R d be open, S' = U x R, and suppose 
f : S' — > R d satishes conditions (j) and (jj) of Theorem 2 with S replaced by S' . Fix the rational number q/p, 
p > and q relatively prime, and Hx a e R. Then Eq. (1.4) with v = q/p + as may be rewritten as Eq. (1.7), 
and Theorem 1 together with Proposition B apply with x and y replaced by (x,t) t and (y,r) T respectively. 
In particular there are positive constants Eq, c — c(T, |a|), and c' = c'(T, |a|) such that \x n — y n \ < cpe and 
\x n — y( n )\ < ( C P + c')e for < e < e an d < n < T/e. 

Remark 2.4 Clearly y n evolves by y n +\ = y n + ef(y n ,ean); and also y(n) = y(en), where y evolves via 
dy/dt = f(y,at). 

Remark 2.5 Propositions A and B characterize the motion of x n to within 0(e 1 ^ x ) for v away from low- 
order rationals, i.e., outside of 0(e x (j)(p)/p) neighborhoods of rationals q/p with < p < R £ . For these v the 
nonrcsonant normal form of Eq. (1.6) applies. Proposition C characterizes the motion to within 0(ep) for v 
inside 0(e) neighborhoods of q/p. For these v the resonant normal form of Eq. (1.9) applies. What is missing 
is information about the motion for v in the gaps between the domains of validity of the resonant normal form 
and the nonresonant normal form. The size of the gaps decreases to zero as A /* 1; however, the error in the 
nonrcsonant normal form simultaneously deteriorates to 0(1). High-order rationals, i.e. q/p with p > R e , are 
of course treated using Proposition B. It is interesting to note that they may also be treated using Proposition 
C; however, the 0(pe + e) error bound deteriorates to 0(1) as p approaches 0(l/e). 

3. Examples from Accelerator Beam Dynamics 

Modern particle accelerators operate at the limits of current technology, and their design and operation depend 
crucially on an understanding of the dynamics of particle beams. In this section we give examples showing how 
Theorems 1 and 2 (supplemented by Propositions A, B and C) may be used to analyze a class of beam dynamics 
models, and how Theorem 3 may be used to analyze the Hcnon map (which is itself a model of certain features in 
beam dynamics). In fact, our averaging principles for maps have features that make them especially effective for 
this purpose; namely, they compare solutions of the exact and averaged model problems in the simplest possible 
way, and produce rigorous mathematical bounds on the difference between these solutions in an essentially 
optimal fashion. Although 0(1/ e) times may be short by accelerator standards (and adiabatic invariance 
of actions on 0(l/e 2 ) times is perhaps ideal), we see our work here as an important step in understanding 
the dynamics of maps on long timescales. We emphasize that these are rigorous error bounds and not error 
estimates. Comparisons between simulations and the averaging appoximations indicate that the error bounds 
hold on much longer time intervals. 

We point out that this section extends certain results of [ES] in at least two important ways: first, by 
using maps, we are able to incorporate delta function "kicks" that could not be treated rigorously via the 
flow methods of [ES]; second, the truncated Diophantine conditions used here are more physically realistic and 
explicit than the small divisor conditions used there (cf. §4.2.1). Finally, we note that our maps need not be 
polynomial here; this is particularly important for the weak-strong beam-beam problem where the perturbation 
is not polynomial. (We also remind the reader of our discussion of this section in the Introduction.) 

We begin in Subsection 3.1 with a general "kick-rotate" model in one degree of freedom. In Subsection 3.2 
we apply the results of Subsection 3.1 to the important case of the weak-strong beam-beam interaction, and in 
Subsection 3.3 we apply Theorem 3 and Proposition B to the Henon map. 
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3.1 The One Degree of Freedom Kick- Rotate Model 

In this subsection, for purposes of illustration we focus on a simple but widely used class of beam dynamics 
models: the so-called one degree of freedom "kick-rotate" models. Wc note, however, that our methods may be 
generalized to treat models with several degrees of freedom and at higher order (this will be the subject of a 
future publication [DEVS]). 

A circular accelerator (in storage mode) has a closed orbit, that is, there exists a unique solution of the 
equations of motion which has the periodicity of the (circular) acclerator. A complete, three-degree-of-freedom 
description of single-particle beam dynamics involves three spatial coordinates in the co-moving (Frenet-Serrct) 
system defined by the projection of the closed orbit on configuration space, and their three conjugate momenta. 
It is convenient to study the dynamics in terms of a Poincare map (one-turn map) at a fixed azimuthal location 
in the ring. Here we consider one transverse degree of freedom and let wi and u>2 denote the spatial coordinate 
and conjugate momentum in the Poincare section. The base-model consists of a "rotation with unperturbed 
tune v" representing the linear "betatron motion." Perturbations of this model often consist of an instantaneous 
change in momentum W2 at a fixed location in the ring, which depends only on the spatial coordinate w\ (a 
"kick-map"). If we take this fixed location to be the azimuthal position of the Poincare section, then the 
perturbed dynamics is given by the so-called "kick-rotate" model 

w n+1 = Rw n + e R[ ° J , where R := e J2 ™ , (3.1) 



-H'{w hn ) 

that is, a kick followed by a rotation through the angle 2-kv about the origin. Here J := ( ® ^ I is the 



unit symplectic matrix and H' is the "kick function." Since R depends only on the fractional part of v we shall 
assume v e [0, 1] in the following. The map defined by Eq. (3.1) is symplectic since it is the composition of 
symplectic maps. The notation u%„ indicates the first component of the vector w n = (wi,W2)I (we hope the 
reader will forgive us the ambiguity of using w n to denote a vector and w\ or w\_ n its first component, and W2 
or W2,n its second component; the meaning should be clear from context, since we rarely explicitly set n = 1 or 
n = 2). 

For R = 1, i.e. v e {0, 1}, Eq. (3.1) is easily solved and gives w n = (wi.o, -«-ff'(t«i,o)) T and thus |w2.n| is 
monotonically increasing to infinity. For R = — 1 (i.e., v = 1/2), w 2n = (it>i,o> — 2nH'(w\ .o)) T and the motion is 
again unbounded. Thus for v e {0, 1/2, 1} and for all initial conditions where H'(wi t o) 0, the distance from 
the origin is monotonically increasing. The basic question is, What happens for general vl We shall apply the 
results of Section 2 to answer this question for most v in [0, 1]. 

Eq. (3.1) may be written as 

w n+1 = Rw n + sRF(w n ) (3.2) 
and the transformation w n = R n x n recasts Eq. (3.2) as: 

x n +i = x n + e R~ n F(R n x n ) =: x n + e f{x n ,nv) , (3.3) 

which is in the standard form for averaging (cf. Eq. (1.4)). 

It is easy to see that f(x,9) = H'(x\ cos 2tt9 + X2 sin 2n9) (sm2n6, — cos2tt9) t = (dH/dx 2 , — dH/dxi) T . 
Thus if we define H(x, 9) := H{x\ cos27r# + X2 sin 2tt9), then Eq. (3.3) becomes 



i-n+l 



eJV x H{x n ,nv). (3.4) 



Equations (3.3) and (3.4) also define symplectic maps, since the transformation is symplectic. 

3.1.1 The kick- rotate model in the far- from-low-order- resonance case 

In this subsection, wc examine the behavior of the kick-rotate model (3.1) in the case where the tune 
belongs to the e-dependent truncated Diophantine set V{e X (f), R e ). In physical terms, this means that the tune 
is "far from low-order resonance." 

The most useful form of H in Eq. (3.4) is given in terms of the Fourier series H(^/2J sin27rf) = 
T,kez H k(J)e l2 * kt , f rom which it follows that H{x,nv) = J^kez H k{J{x)) e^ fc (*(*)+^) ) w here $ and J are 
defined by x\ = V2J sin(27r$) and x 2 = \/2J cos(27r$). The averaged problem is then 

y n +i=yn + £J^ y H (J(y n )), (3.5) 
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where Ho(J) = Jq H(s/2J sin27rf)dt. The associated (scaled) flow problem is 

^=2nu(J(y))Jy, y(0) = x (3.6) 

where 2nu>(J) = H' (J). We note that the map defined in Eq. (3.5) is only symplectic through 0(e); however, 
the vector field in Eq. (3.6) is Hamiltonian with Hamiltonian H(J(y)). It is easy to check that J(y) = 5(2/1+2/2) 
is constant along orbits so that J(y) = Jq = J(x ) and thus y(t) = e J27TUJ ( Ja ' )t x a . Finally, Theorem 2 together 
with Propositions A and B give 

w n = e ^™(»+^(J°)) XQ + 0(s 1 ~ x ) (3.7) 

for < n < T/e, with e suitably restricted as in Proposition B for non-compactly supported perturbations, 
with As [0, 1), v € T>(e X (f>, R e ), and with R £ defined by the condition 

Y, \\H' k {J)V x J\\d(6) + ||^ fe (J)27rV x $|b (5 ) < Ce, (3-8) 

\k\>R e 

where D(5) is the <5-tube around the solution of Eq. (3.6) (see the definition of the 5-tube in §4.3.2). 

3.1.2 The kick-rotate model in the near-to-low-order-resonance case 

For v near low-order resonance, we write v = |+ea whenp is not too large (more precisely, when < p < R e 
for suitable £, e > in (3.8)). Thus using Eq. (1.7), our problem becomes 

x n +i \ _ fx n + e JV x H{x n ,nl + t„) \ , g g , 

TV1+1 J \ T n + ea J 

We are now in the periodic case, with averaged Hamiltonian H(x,t) = (1/p) X)n=o H{x\ cos(27r[n^ + r]) + 
x 2 sin(27r[n| +r])). The averaged problem is (y„+i,r„ + i) = (y n + e J V y H(y n ,T n ), r„ +ea), with its associ- 
ated scaled flow {dy/dt^dr/dt) = (j7"VyW(y, r), a). Solving for t gives ^| = JVyH{y,at). Theorem 1 with 
Propositions B and C then give 

for < n < T/e and for v = ^ + ea. However, it is not clear we have achieved a great simplification and so we 

look more closely. It turns out that ft (exp(— J2-K6')y, 9) = H(y,6 — 9'), which suggests that an autonomous 
Hamiltonian system might be found with the symplectic transformation y z defined by y — e ~ J2nat z. This 
is indeed true and gives the autonomous system 

di ~ 

— = 2iraJz + JV i H(z,0) (3.11) 
with Hamiltonian JC(z) = 2iraJ(z) + H(z, 0). Equation (3.10) thus becomes 

w n = e J2 ™i z(en) + 0(e) (3.12) 

from which the behavior of the approximation is now quite transparent. 

3.1.3 Summary of the kick-rotate model 

We now have the following picture of the solutions of Eq. (3.1) on 0(1/ e) time intervals. For v e V(e X (j), R E ) 
the motion is given by Eq. (3.7) and thus our kick- rotate map behaves like a twist map with tune v + eu(Jo). 
For these v the effect of the perturbation is slight; the up and down kicks on the integral curves essentially 
cancel and the main effect of the perturbation is to create an amplitude-dependent tune. For v = | + ea, we 
see that in the p-periodic Poincarc map, the approximate motion moves slowly along the phase curves given by 
the level curves of K(z). We thus have an essentially complete picture of the motion (except for small gaps in 
v as discussed in Remark 2.5). 
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3.2 The Weak-Strong Beam-Beam Effect 

As a concrete example, we study the weak-strong beam-beam effect for round Gaussian beams in collider 
rings. We treat the lattice (the sequence of transport maps through the various components of the accelerator) 
as a stable, linear symplectic map, and the beam-beam interaction as localized at the point of the ring where the 
bunches collide (the "interaction point" ) . The phase space distribution of the strong beam at the interaction 
point is assumed to be stationary; in particular the beam-beam effect of the weak beam on the strong beam 
is ignored. Therefore the beam-beam effect on the particle trajectories of the weak beam may be treated in 
the single particle picture, i.e., as a nonlinear kick due to the electromagnetic forces experienced while passing 
through a (longitudinally) short and time-independent external charge distribution. We ignore coupling to 
the longitudinal motion, and we assume that the strong beam is represented by an axially symmetric charge 
distribution around the common closed orbit of the two beams in the transverse coordinate plane, so that it 
suffices to study a single phase plane. We start by stating the model in the so-called canonical accelerator 
coordinates v = (t>i,i | 2) T , where v\ has the dimension of a length and v 2 := p Vl /po is dimensionless (po is the 
constant longitudinal momentum of the particle on the closed orbit, usually much larger than p Vl , the canonical 
conjugate of i>i). Normally the lattice is chosen so that the unperturbed beam envelope at the interaction 

cos(27tQ ) /?sin(27rQo) 



point has a local minimum, and thus the linear lattice is represented by M :— , 

V - sin(27r<3o)//3 cos(2ttQ ) 

where Qo e R and (3 > are the unperturbed tune and the unperturbed beta-function of the weak beam 
at the interaction point, respectively (the beam envelope has width of order yf]3). The beam-beam kick is 

given by v 2 i— > v 2 — r]K(vi), where rj := and where K(v\) := (l — exp ^— ) ■ -Here a\ is 

the spatial standard deviation of the Gaussian representing the strong beam, and £ is the (typically small) 
linear beam-beam tune shift parameter. Our difference equation in the accelerator coordinates now reads 
v n+1 = M v n + i] M (0, -K («i, n )) T . 

Remark 3.1 In the special case of two matched, axially symmetric Gaussian beams, £ is given by £ = 
±N*r p (3 /(Anjaf), where N* is the number of particles in the strong beam, r p is the so-called classical particle 
radius of the species, o\ is the spatial RMS beam width of the two beams, and 7 > 1 is the Lorentz factor of 
the weak beam. 

We now rescale the variables according to w = (tUi,u>2) T := (wi/ci, f3v 2 /o'i)' T , where a\ is the standard 
deviation of v\ for the weak beam when matched to its unperturbed lattice (i.e., when the phase space density 
depends only on v T B~ 1 v, where B := diag(/3, 1//3) is the beam matrix at the interaction point; note that 
C2 := cri/P is then the standard deviation of v 2 for the weak beam). In the rescaled variables the difference 
equation becomes 

w n+ i = R w n + e R ( ° ], (3.13) 



-H'(w hn ) 

where R := e J2nQo , e := 87rr 2 £, r := at/a u and H'( Wl ) := ^- ^1 - exp (~|^))- Thus E q- (3-13) has 

the form of Eq. (3.1). We note that e is dimensionless and small whenever £ is small, that w\ and w 2 are 
dimensionless and O(l) for a typical particle trajectory of the weak beam, and that in a collider the two beams 
are typically matched to each other so that r w 1. By using the substitution s 2 /(2r 2 ) = w\/(2r 2 + s') one can 
show that 

« = f'('-'(4))7^f(-(-^))^' 

where we have taken H(0) = 0. 

Before proceeding we check the linearized behavior about the equilibrium w = 0. The linearization of 

Eq. (3.13) is w n+ i = G w n , G :— R ^ where we have used the fact that H"(Q) = (2r 2 ) -1 . 

The system is linearly stable if and only if |trG| < 2, i.e., provided the linearly perturbed tune Q, defined 
by cos(27rQ) := ^trG = cos(27rQ ) ~ 27r£ sin(27r<5o), is real and satisfies |cos(27r<5)| < 1. It follows that 
Q = Qo+£+0(£ 2 ), thus justifying the name "linear beam-beam tune shift parameter" for £. For Q £ {0, 1/2, 1} 
we see that |trG| = 2, consistent with the discussion in the paragraph immediately following Eq. (3.1). For 
Qo G {1/4, 3/4}, |tr G\ = 27r|£| and thus we have linear stability, which is consistent with the results of Subsection 
3.2.2. 
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3.2.1 The weak-strong beam-beam effect in the far-from-low-order-resonance case 

For Qq € T>(e x 4>, R £ ) the motion is given by Eq. (3.7), where v = Qo, and where ut is determined as 



follows. We use Eq. (3.14) to obtain H (J) := ft #(\/2Jsin(27rf)) dt = ± J^ /(2r } (1 - e- w I (w)) ^f, where 
Io is the zero-th order modified Bessel function and where we have used the expansion exp(a; cos(y)) = Iq{x) + 
2 YlkLi Ik{%) coa(ky). Omega is given by 



2ttlj(J) := H' (J) = 



2J 



1 — exp I — 



J 

2^2 



J 

2^2 



The amplitude-dependent tune shift euj(J n ) is identical to that derived in [ES] and justifies the use of the delta 
function there. Notice also that eu>(0) = £, in agreement with the linearization above. 

3.2.2 The weak-strong beam-beam effect in the near-to-low-order-resonance case 

In Subsection 3.1.2 we found the Hamiltonian for the autonomous system (3.11) to be JC(z) — 2naJ(z) + 
H(z,0), where H(z,0) = (l/p) X)«=o H(z\ cos[2irnq/p] + z 2 sm[2irnq/p\) and Qq = q/p + ae. Since H(x) 
approaches zero for large x, IC(z) approaches 2iraJ(z), and for a ^ the integral curves become circles at large 
distances from the origin. The motion on these circles is clockwise for positive a and counterclockwise for negative 
a, thus a bifurcation in the phase plane portrait occurs at a — 0. In the case where q/p G {0, 1/2, 1} it is easy to 
see that H(z, 0) = H(z-\), and for q/p e {1/4, 3/4} one also easily finds Tt(z, 0) = 1/2 [H(z\) + H(z 2 )] since H is 
an even function. For q/p e {1/3, 2/3} we find H(z, 0) = 1/3 [H(z 1 )+H(-z 1 /2+V3z 2 /2)+H (-ii/2- VSz 2 /2)]. 
We briefly discuss the phase plane portraits for JC in these cases (sec [DEV] for more figures). 

In the first case (q/p € {0, 1}) and for a = we have dz\/dt = and dz 2 /dt = H'(z\fi). Thus the motion 
is identical to the exact case, as discussed just before Eq. (3.2), since Eqs. (3.1) and (3.3) and the associated 
averaged problem are identical. For a small but positive, the origin is a (nonlinearly) stable center and the phase 
portrait is a one-parameter family of ovals which are long and thin in the z 2 and i\ directions respectively. As 
a increases to modest values the ovals become circular, consistent with the expectation of "stability far from 
low-order resonance." As a decreases from zero, the origin becomes a saddle, and two centers emerge from 
infinity at (±c, 0), where c <~ l/^2w\a\ for \a\ small. As a decreases further, the centers coalesce with the 
saddle at Airar 2 = — 1, and for 4nar 2 < — 1 the only critical point is a center at the origin, again consistent with 
our expectation of stability (see Figure 1). 

The motion for q/p = 1/2 in the period two Poincare map is identical with the motion for q/p = 1; the 
intermediate values may be obtained by rotating the phase plane portrait by a half turn (also see Figure 1). 



a>0 



a=-l/47T 



a«0 
A 




a<0 



a=0 



i 










a»0 




Figure 1: The qualitative phase plane portraits for r = 1 in the case q/p e {0, 1/2, 1} 

For q/p G {1/4,3/4} the phase plane portrait (see Figure 2) has a four- fold symmetry, being invariant 
under reflections about the two axes and about the lines z 2 = ±z\. The origin is a critical point and its 
linearized vector field has eigenvalues ±.2m(a — a c ), where a c = — l/(87rr 2 ). Thus the origin is a (nonlinearly) 
stable center for a/« c , and it is easily checked that the origin is also a stable center for a = a c and that the 
rotation is clockwise for a > a c and counterclockwise for a < a c . For a > there are no other equilibria and the 
phase plane portrait is a one-parameter family of concentric ovals. For a small the (closed) integral curves look 
like four-pointed stars, with smoothed points on the axes, and as a increases the curves become circles. For 
a c < a < there are eight nonzero critical points. The four critical points (±c, ±c) are centers and the four at 
(0, ±c) and (±c, 0) are saddle points, where c is the unique positive root of 47rac + H'(c) = 0. The critical points 
form an island structure in a neighborhood of radius c of the origin in the phase plane. This island structure 
emerges from infinity as a decreases through zero and coalesces into the origin as a decreases to a c . For a < a c , 
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the origin is again the only equilibrium, and it is a stable center with counterclockwise rotation. The portrait 
is again a one-parameter family of ovals approaching circles as a decreases from a c . 

a = -0.050 a = -tt/8 a = -0.005 a = -0.000 a = +0.005 a = +0.050 




Figure 2: The phase plane portraits for r = 1 in the case q/p G {1/4, 3/4}. 



Because H is an even function, TL is the same for all q/p G {1/6,1/3,2/3,5/6}. Thus the phase plane 
portraits are the same for resonances of order three and six, and these portraits have a six-fold symmetry, 
being invariant under reflections about the axes z\ = 0, i 2 = and the lines z 2 — ±Zi/V3 and z 2 = ±V3zi. 
Qualitatively, the behavior as a function of a is similar to that in the case of resonance of order four (e.g., the 
island structure is similar, but there are now six rather than four islands). The critical value a c at which the 
islands coalesce in the origin turns out to be the same as in the case p — 4. 

3.2.3 Summary of the weak-strong beam-beam effect 

Our basic equation is Eq. (3.13) with R and H' defined there. Remark 2.5 and the summary in Subsection 
3.1.3 apply. Here we emphasize that the motion depends only on £ (or equivalcntly e), and on the fractional 
part of Qo, and that we have a fairly complete description for Q G [0, 1] over 0(l/£) time intervals. Away 
from low-order resonances, the motion is given by Eq. (3.7), with v = Qo and with u defined in Eq. (3.15). 
Thus the motion takes place approximately on circles with an amplitude-dependent tune. Near low-order 
resonances the behavior is given by Eq. (3.12), and z(t) evolves according to the time-independent Hamiltonian 
JC{z) := 2iraJ(z) + (l/p) J2n=o ^ cos(2iTnq/p) + z 2 sm(2nnq/p)). As described above in Subsection 3.2.2, this 
Hamiltonian has a rich variety of behaviors depending on the order p of the resonance, and on the displacement 
ae from the resonance. In particular the behavior varies considerably for a > 0, a = and a < 0. Finally, we 
again emphasize that while our description is fairly complete, there are gaps between the regions of validity of 
the nonresonant normal form which does not depend on Q , and the resonant normal form which does depend 
on Qo (cf. Remark 2.5). 

3.3 The Henon Map 

We now apply Theorem 3 to the Henon map (in beam dynamics this map is a standard model for the effect 
of a localized sextupole magnet in an otherwise linear lattice). The standard form of the Henon map is Eq. 
(3.1) with H(wi) = tu?/3. This gives Eq. (3.4) with H(x, 6) = (xi cos2tt# + x 2 sin27r0) 3 /3, which clearly has 
zero average. It follows that f(x,9) = JV x 7i(x,0) in Eq. (1.4) has zero average, so that hypothesis (jw) of 
Theorem 3 is satisfied. Thus, by Theorem 3 and Proposition B, for appropriate e, T > 0, v G D((j>, R e ), and for 
any < a < 1, we have \x n — xq\ = 0(e a ) on the discrete time interval < n < T/s 2 ~ a . 

Remark 3.2 The above discussion simply applies Theorem 3 as is (and thus also covers the case of more general 
H), but when H has a finite Fourier series (e.g. when H is a polynomial, as above) the proof of Theorem 3 may 
be simplified, both in terms of the smoothness requirement (sec Remark 4.4) and in terms of the estimates in 
Lemma 2. In particular, for the Henon map above, gk = except for \k\ G {1, 3}, so taking R £ = 3, we see that 
the series defining C\ and C 2 in Lemma 2 have only four terms each, while the tail-series of Lemma 2 vanishes. 

4. Proofs and Additional Mathematical Results 

As the title indicates, this is the most mathematical section of the paper. Subsection 4.1 treats periodic maps; 
this is quite straightforward, and may be read as a kind of introduction to the deeper results of the next 
subsection. Subsection 4.2 concerns the considerably more complex case of maps far from low-order resonance, 
and requires a (short) discussion of small divisors and truncated Diophantinc conditions. The use of such 
conditions is not new (for example, related conditions arc used to obtain general multiphase averaging results 
in [ABG]), but as explained in the introduction, we believe our use of them in the present context is the most 
innovative aspect of this paper from the viewpoint of applied mathematics. 
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4.1 Periodic Systems 

In this subsection we give a self-contained presentation of the remarkably simple technology required to 
prove the averaging principle for maps with periodic perturbations. This consists of the Besjes inequality for 
periodic functions (below), followed by its application to the proof of Theorem 1. 

4.1.1 The Besjes inequality for periodic functions 

Let U C R d be open, and S = U x N. The Besjes inequality relies in an essential way upon the following 
assumption concerning the function g : S — > R, periodic with period p in its second argument: 

p-i 

(iv) For each x £ U, g(x, n) = 

n=0 

When g has period p in n and satisfies (iv), we say it has zero mean in n. We now state the Besjes inequality 
for periodic maps as 

Lemma 1. Let U C R d be open, S = U x N, and suppose g : S — > R satisfies assumptions (i), (ii) (from §2.1) 
and (iv) above and is globally x-Lipschitz with Lipschitz constant L > 0. If {x n }^ =0 C U is a sequence for 
which the successive differences x n +i — x n are bounded by M (i.e., sup„ \x n+ \ — x n \ < M), then for all N £ N, 



JV-l 



^2 9{xn,n) 



n=0 



< - NpLM + p \\g\\s- 



Proof. Using the notation [a] to designate the greatest integer in a, we first set I = [(N — l)/p] (so that I is the 
number of periods of g contained in the segment {0, 1, 2, . . . , N — 1}). Then using the fact that g is periodic 
and of zero mean, we write 



JV-l 



JV-l 



^2g(x n ,n) =^2^2(g(x n+kp ,n) - g(x kp ,n)} + ^ g{x n ,n) . 



n=0 



l-l p-1 

EEI 

k=0 n=0 n=lp 

Now since g is Lipschitz in its first argument, and since \x n +k P — %k P \ < Mn, we have 

JV-l 2-1 p-l JV-l 

\^2 i g(x n ,n) < ^2^LMn + ^ \g(x n ,n)\ 

n=0 k—0 n—0 n—lp 

< lLM PiP 2 1) + p\\g\\ s < \n V LM + p\\g\\ s . // 

Remark 4.1 The original version of this lemma (Lemma 1 of [Bcs]) was formulated for use in the proof of 
averaging principles for ODEs on 0(1/ e) timescales, and we use its analog in a similar way below for maps. The 
original lemma bounds the time by a constant that is 0(1 /e) and gives a final bound that is 0(e), independent 
of time. We have found, however, that retaining the (here discrete) time-dependence makes the result more 
versatile (cf. the proof of Theorem 3 below) . 

Remark 4.2 Lemma 1 (and many of its generalizations) may also be proved using "summation by parts," as 
in the proof of Lemma 2 below. 

We now illustrate the use of Lemma 1 by using it to prove Theorem 1. 

4.1.2 Proof of Theorem 1 

Assume the hypotheses of Theorem 1 (cf. §2.1). It is clear from assumption (iii) that the solutions x N and 
Un exist uniquely for all iVeN. To see that the approximation relation holds, we write 



Fjv - Un\ 



JV-l 



n=0 



JV-l 



X] (f( x n,n) - 7(2/n)) = e ^ (j(x n ,n) - f(y n , n) + f(y n , n) - f{y n )j 



n=0 



JV-l 



JV-l 



< eL^2\x n - y n \ + e ^2 f(y n ,n) 



n=0 



n=0 
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where L is the x-Lipschitz constant of /, and where f(y,n) := f(y,n) — f(y) (the "oscillating part of /") 
satisfies the hypotheses of Lemma 1 with U = R d (in particular, / has zero mean and y-Lipschitz constant 2L). 
Using the fact (from Eq. (1.2) and assumption (i)) that \y n +i — Vn\ < M £||/||r<*, we have \xn — j/jv| < 

N-l N-l 

eLj2\x n -y n \ + e-N P 2Le\\f\\ nd + ep||/||sr. Thus\x N -y N \ < eL^K-Vn] + ep (LT\\f\\ Rd + \\f\\ s ) 

n=0 n=0 

for < N < T/e. Applying Gronwall's inequality for sequences (Lemma 3 in the Appendix) and setting 
C = (LT||/|| Rd + ||/||<?)e LT gives \x N - y N \ < Cpe for < N < T/e, as claimed. The second part of 
Theorem 1 (namely \x n — y{n)\ < (Cp + C')e for < n < T/e) follows from Lemma 4 (Appendix) and the 
triangle inequality. // 

Remark 4.3 The preceding is no doubt one of the simplest possible proofs of an averaging principle for 
maps. Part of the simplicity derives from the use of Lemma 1, and part derives from the assumption of 
compact support (iii), which permits us to dispense with questions of the existence intervals for solutions. 
Thus, although assumption (iii) is often invalid in practice, by using it we are able to show that the basic 
estimates of the averaging method do not require restrictions on the size of e; such restrictions are instead 
introduced by considering solutions' existence intervals, or by methods of proof which rely on near-identity 
transformations (which may in turn require restrictions on e for their inversion). Of course our results may be 
extended to cases with finite existence intervals (see Proposition B, §2.4), and may also be combined with more 
traditional transformation methods to obtain efficient results at higher order [DESV]. 

4.2 Systems Far From Low- Order Resonance 

In this subsection we generalize the Besjes inequality to functions far from low-order resonance in their second 
argument. We then use this inequality to prove Theorems 2 and 3. First, however, we present the following 
brief discussion. 

4.2.1 Resonant zones, Diophantine conditions, and the ultraviolet cutoff 

Before stating and proving our next analog of Besjes' inequality, we discuss aspects of resonance, small 
divisors and Diophantine conditions that will be needed in the sequel. A more comprehensive introduction may 
be found in [Yo]. 

Zone Functions and Diophantine Conditions 

In dynamical systems, Diophantine conditions arise naturally as a means of "controlling small divisors" 
and "avoiding resonances." Typically, in one dimension, divisors of the form e 2vlkl/ — 1 (with ^ fc S Z and 
=/= v G R) occur as the denominators of terms in a series indexed over k, together with numerators which 
decrease to zero with increasing |fc|. Clearly divisors cannot vanish, so rational (or "resonant") values of v 
must be avoided. And although irrational v do not cause divisors to vanish, when "nearly resonant," they may 
generate such small divisors as to cause divergence of the series in which they occur. 

By using a suitably decreasing zone function <j> : R + — > R + (the inverse of which is called an "approximation 
function" in [Ru]), we define the "highly nonresonant" values of v as those belonging to the corresponding 
Diophantine set 

V(cb) = {v G R | \e 2 ^ - 1| > </>(|fc|), k € Z\{0}}, (4.1) 

which is a Cantor set. The Diophantine set T>(cp) may be thought of as R with countably many zones removed, 
where the zone Zk = {v G R | \e 2nlku — 1| < 0(|fc|)} corresponding to a particular k ^ is the countable union 
of open intervals centered on rational numbers of the form q/k (q G Z). To better see the structure of T> (</)), 
consider its intersection with the interval [0, 1]. For each fixed k > we remove k intervals of length 26 from 
[0,1], where \ e ^HS+l/k) - 1| = \ e t2M - 1| < <f>{k). For small </>(k), this gives S w <j>(k) / (2nk) , and thus the 
total length of Zk n [0, 1] is 25k ss (p(k)/n. It follows that the total length of the union UfcZfc n [0, 1] of the 
overlaps of all zones Zk with [0, 1] is (approximately) bounded by length (Z^ n [0, 1]) « (1/tt) 4>{k) dk. 
Thus a typical zone function of the form 4>(r) = ^r~( T+1 ) with r > removes zones of total length no more 
than 7/(71-7-) from [0, 1]. When this total length is less than one, the Diophantine set V{4>) has positive measure 
(and is therefore nonempty). 

More generally, if the zone function <j> decreases too slowly, then the union of the excluded zones may be so 
large that its complement, T>{<f)), is empty. Conversely, if 4> decreases too rapidly, then T>((j)) may be too large, 
and may contain values of v so close to resonance as to cause divergence of the series in which small divisors 
appear. 

The following terminology is useful for describing zone functions that permit convergence of the series arising 
in the proof of Lemma 2 below. If U C R d is open, and / :[/xR^ R d has period 1 in its second argument and 
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Fourier series f{x,6) ~ Efcez fk( x ) e2mk " (where the fcth Fourier coefficient is f k (x) = J* f{x, 6) e~ 2nik6 d9, 
requiring only that / is integrable in 9), then given a zone function <j> such that T>((f>) ^ 0, we say that <f> is 
adapted to f on U provided 



where Dfk denotes the derivative of the function fk : U — > R d . Smoothness conditions on / assuring the 
existence of zone functions adapted to / are not severe, as we now show. 

Smoothness Conditions Ensuring the Existence of Adapted Zone Functions 

Several questions naturally arise concerning the relationship between the smoothness of / and the existence 
of zone functions adapted to / as in Eq. (4.2). Formulating the sharpest possible conditions in this direction is 
somewhat delicate, but the following brief discussion should serve as a good starting point. 

We first recall that for r > 0, the zone function cp(r) = ^r~^ T+1 ^ generates a nonempty Diophantine set 
T>(<p) provided 7 > is sufficiently small (see the preceding discussion, or the more extensive discussion in 
§1.2 of [BHS]). We assume that / : U x R — > R d is of class C P+1 (U x R) and of compact support in the 
first argument, uniformly with respect to the second (cf. assumption (jjj) in §2.2). Integrating the fcth Fourier 
coefficient fk(x) — Jq f(x, 9) e ~ 27Tlke d9 by parts p times with respect to 9 gives 

f k (x) = (2irik)-P J^[d p f/d9'P](x,9)e- 27T '' k9 d9. Then taking the suprcmum over x e U of both sides of this 
expression gives ||/k||t/ < C(f,P)\k\~ p , where C(f,p) = , 2 K P swp xeU J Q ^j(x, 9) dO. The same estimate holds 

for \\Dfk\\u with C(f,p) replaced by C'(f,p) = su P*e£/ Jo 2^(x,6) dO. 

Using these estimates, wc immediately deduce that both of the series in Eq. (4.2) are convergent provided 
that p > t + 2. Conversely, we see that whenever p > 3, there exists a zone function <p{r) = 7r~( T+1 ) with 
< t < p — 2 which generates nonempty Diophantine sets V{<p) (for 7 sufficiently small) and which is adapted 
to / in the sense of Eq. (4.2). This justifies our assumption (j) in Theorems 2 and 3. 

Remark 4.4 A more refined (and lengthy) argument shows that the existence of <f> adapted to / does not require 
quite as much smoothness as we demand above; we start our discussion under the assumption / e C P+1 (U x R) 
primarily for simplicity. Of course, when / has a (sufficiently short) finite Fourier series, the decay rate of its 
terms is not an issue, and the smoothness requirement may be reduced to C . 

Remark 4.5 Although our results for system (1.4) as presented in this paper do not apply to the case 
of analytic perturbations ef (since analytic / with compact support vanishes identically), it would not be 
especially difficult to extend our theory to this case. For analytic / : U x T 1 — > R with Fourier coefficients 
fk decreasing exponentially as, say, ||/fc||c; < Te~^, it would be appropriate to use exponentially decreasing 
zone functions, for which the preceding discussion is easily modified. In fact, given any p > 0, the zone function 
4>{r) — -fe~ pr generates nonempty Diophantine sets V{4>) for small enough 7 > 0. The decay rate (3 of the fk 
must of course exceed p, which can be arranged provided / is analytic in its second argument with analyticity 
parameter a > p (this is an instance of the Paley- Wiener Lemma; cf. [PW] or [BHS]). Roughly speaking, the 
analyticity parameter a is a measure of the minimum distance by which / may be extended as an analytic 
function of the complex torus (see also §4.3.3 of [DEG] for an elementary discussion in the two-dimensional 
case). 

It is interesting to note that Diophantine conditions corresponding to exponentially decaying zone functions 
cj> may be strictly weaker than the weakest small-divisor conditions ordinarily used in dynamical systems, the so- 
called Bruno conditions (also spelled Brjuno or Bryuno; here "strictly weaker" means that the set T>(<f>) properly 
contains the set of v subject to Bruno conditions). This is however not surprising, since Bruno conditions apply 
to situations (such as conjugacies of circle diffeomorphisms, or KAM theory) in which countably many series 
with small divisors must simultaneously converge. By contrast, in Lemma 2 we require the convergence of only 
two series (in the language of [BHS], ours is a "one-bite" small-divisor problem). 

The Ultraviolet Cutoff and Truncated Diophantine Conditions 

Finally, we introduce the notion of ultraviolet cutoff, which is important in physical applications of Dio- 
phantine conditions. To understand why, note that typically in applications, the v that are required to be 
Diophantine are physical parameters. But checking whether a given v belongs to a Cantor set of the form V{<p) 
is a practical impossibility, since each point of T>{4>) has points arbitrarily close to it that are not in T>(<p). 
In other words, deciding if v belongs to T>(4>) requires v to be specified with infinite precision. Practically of 
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course, it is only possible to specify physical parameters with finite precision. We surmount this difficulty by 
introducing truncated Diophantine conditions of the form 



V{(j>,R) ={v e R| |e 



2ixikv 



1| > <j>{\k\), keZ with < |fc| < R}. 



(4.3) 



When v E T>(<f>,R), we say v is Diophantine to order R with respect to <j>, and we call R the truncation 
order or (ultraviolet) cutoff. Note that 2?(0, R) is an approximating superset of T>(cp) with nonempty interior 
which converges to T>(4>) as R — > oo. To decide whether ^ belongs to T>(4>,R), one checks only finitely many 
inequalities. 

As a rough general rule, results in dynamical systems which are established for Diophantine sets T>(<j>) 
may also be established (usually in slightly weaker form) for the corresponding larger, nicer sets T>((j>,R). 
The standard technique for doing so involves removing the "i?-tail" of a series before applying Diophantine 
conditions, then checking that the tail is small. This technique was called the "ultraviolet cutoff" by Arnold in 
his proof of the KAM theorem [Ar] , and is illustrated in the proof of Lemma 2 below. 

4.2.2 Besjes' inequality generalized to functions far from low-order resonance 

Lemma 2. Let S = R d x R, and suppose g : S — > R d satisfies assumptions (j), (jj) from Subsection 2.2, along 
with assumption (jw) from Subsection 2.3. Let the zone function <p be adapted to g on R d in the sense of 
Eq. (4.2), and define the positive constants C\ = Ci(g,(f>) and C 2 = C 2 (g,(j)) by C\ = 2^ _^ fe ||<7fc||R,<*/</>(|fc|) 

and C 2 = J2o^k H^5fcllR. d / ( X|k|)- Let u e ^>(0>-R)- H { x n}lf=o C R d is a sequence for which the successive 
differences x n+ \ — x n are bounded by M (i.e., sup„ \x n+ i — x n \ < M), then 



N-l 



n=0 



9{xn,nv) < Ci + n(c 2 M + 



E hkW-R.* 

\k\>R 



where 



E \\9k\\nd 

\k\>R 



as R — > 00. 



Proof. Since C\ < 00, we write g as its uniformly convergent Fourier series g(x,6) = X)o#fcez 9k(x)e 
that 



2itik6 



so 



N-l 

E 

n=0 



g(x n ,nv) 



< 



N-l 

E E 9k(x n )e 

n=0 0<|fc|<fl 



2-Kiknv 



Jlniknv 



(4.4) 



N-l 

E 9k(x n )e 2 

n=0 \k\>R 

We shall treat separately each of the double sums on the right-hand side of inequality (4.4). For the first double 
sum, we reverse the order of summation and use the "summation by parts" formula 

1) 



Yln=o a n(b n +i - b n ) = (a N b N - a b ) - Y^n=o( a n+i - dn)K+i with a n = g k (x n ) and b n = e 



2iziknv j ^Jl-nikv 



so that a n (b n+ i - b n ) 



N-l 

E E 

0<|fe|<fl n=0 



< 



9k(x n )e 2 ^ knv 

ihkW-R* 



g k (x n )e 2mknu . It then follows that 

g k (x N )e^ iNkv - g k (x ) 



< 



N-l 



0<|fc|<_R 



E 

0<|fe|<fl 

\\Dg k \ 



- E (.9k(x n +i) - gk(x„)) 



2-ni{n-\-l)kv 



N-l 



^,2izikv 



-1 



?2ivikv 



\X n +l 



< 



71 = 



E 

0<\k\<R 



^iTiiku ^ 

2\\g k \\^+NM\\Dg k \\^ d 



r/lisikv 



< 



E 



2||. 9fc || Rd +JVM||£> gfc || Rd 



< 



E 



2\\g k \\ Rd +NM\\Dg k \\ Rd 
<f>(\k\) 



= Ci + NMC 2 ■ 



(4.5) 



N-l 

E E 9k(x n )e 

n=0 \k\>R 



2 7riknu 



OO . 



(4.6) 



0<|fc|<i? 

We next treat the second double sum (the J?-tail) on the right-hand side of inequality (4.4) using the simple 
estimate 

N-l 

< E W9k\\n d - as R 

\k\>R 

Inserting estimates (4.5) and (4.6) into inequality (4.4) concludes the proof. / / 

Remark 4.6 A related analogous result for flows (but without the ultraviolet cutoff) appears as Lemma 13 of 
[Sa], and in Theorem 2 of [ES], and a more general Besjes- type inequality for so-called KBM vector fields also 
appears in [Sa] as Lemma 2. A still more closely related result for flows appears as Lemma 2 in our previous 
paper [DEG] , where it was used in averaging methods applied to certain classes of charged particle motions in 
crystals. 

Remark 4.7 In the case where g has a finite Fourier series, the above proof simplifies in obvious ways; but 
these simplifications become problematic as the Fourier series grows in length (note that the example in §3.3 
has a Fourier series with only four terms). 
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4.2.3 Proof of Theorem 2 

Assume the hypotheses of Theorem 2 (cf. §2.2; note that assumption (j) ensures the existence of zone 
functions adapted to /, as discussed before Remark 4.4). The proof is essentially the same as the proof of 
Theorem 1 with appropriate changes as needed in order to use Lemma 2. As in the previous proof, the solutions 
xn and clearly exist uniquely for all N £ N. For the approximation relation, we write as before 

N-l JV-l 

\xn -vn\< sl | 

£ XI f{yn,nv)\ 

n=Q n=0 

where f{y 7 0) ■— f{y,9) — f{y) is the oscillating part of /. The hypotheses clearly imply that ||/||s < °o, 
and since </> is adapted to / on R d , the constants C\ and C 2 from Lemma 2 are well defined. We may thus 
set C — (Ci + C 2 T\\f\\ R d + l)e LT . Finally, we fix the parameter ( > and choose R e > so large that 
J2\k\>R E ll/fc|lR. d ^ ( £ > where fk{x) is the fcth Fourier coefficient of /. It is now a simple matter to check that if 
v G T>(4>, R e ), then the hypotheses of Lemma 2 are satisfied with M := e||/||s- We thus have 

N-l 

\xn-Vn\ < eh kn -Vn\ + eCi +eN(c 2 M+ ^ ||/fc|| R d) 

n=0 ' \k\>R 

N-l 

< £Lj2\ X n-Vn\ + £Cl + ^(C^HJUr" + (e) , 

n=0 

JV-l 

and so for < N < T/e, we have \xn — yw\ < sL ^ ' \x n — y n + e(Ci+C7 2 T||/|| Rd + C). Applying 

n=0 

Gronwall's inequality for sequences (Lemma 3, Appendix) gives \xn — Vn\ < e(Ci + C2T||/|| R <j + (^e sLN < 
e(Ci + C 2 T||7|| R(i + ()e LT = Ce for < N < T/e, as claimed. The second part of Theorem 2 (namely 
\ x n ^ y{ n )\ < C's for < n < T/e) again follows from Lemma 4 (Appendix) and the triangle inequality. / / 
Remark 4.8 It is important to note that for fixed positive £ and e, the ultraviolet cutoff R e need not be 
very large to ensure that X)|fe|>fl II/'sIIr/' ^ C £ i whence the number of inequalities to be checked in Eq. (4.3) 
(with R = R £ ) is also modest. In fact, straightforward estimation shows that when the Fourier coefficients of / 
decrease as ||/ fc || R d < C|fc|~( p+1 ) (e.g. when / is of class C p+1 ), it is enough to take R e > 1 + (f§) Vp (and 
when the coefficients decrease as 1 1 ./"fc 1 1 r.^ < Ce~ p \ k \, it is enough to take R e > 1 + m (^) 1 ^)- 

Remark 4.9 If an 0(e 2 ) term is added to Eq. (1.4) so that it reads x n+1 = x n + e/(x„, nv) + e 2 g(x ni nv) , 
where g : S — > R d satisfies the hypotheses of Theorem 2, then it is a simple matter to check that Theorem 2 
continues to hold with the order constant C replaced by C = (C\ + C2T||/|| R d + C + ll5'lls) e ' LT - This form of 
Theorem 2 is often useful in applications. 

4.2.4 Proof of Theorem 3 

Assume the hypotheses of Theorem 3 (these include those of Theorem 2 together with the additional zero- 
mean assumption (jw); cf. §2.3). The hypotheses clearly imply that \\f\\s < oo, and since <j> is adapted to 
/ on R d , the constants C\ and C 2 from the conclusion of Lemma 2 are well defined. We may thus choose 
the parameter £ > and set K\ = C\ and K 2 = C2||/||s + C- Finally, we choose R E > so large that 
^2\k\>R \\fk\\n d < C £ - ^ is now a simple matter to check that whenever v E T>(<j>,R £ ), the hypotheses of 
Lemma 2 are satisfied with M := e||/||s, from which we conclude that 



\xn - Xq\ 



N-l 



Y,fi x n,nv) < eCi+eN[C 2 M+ ^ ||/ fc || Rt 

\k\>R c 



n=0 



< £Ci+£N(C 2 e\\f\\s + (s) < K l£ + K 2 e 2 N. // 

Remark 4.10 The proof of Theorem 3 is so short, and its hypotheses are so closely related to those of Lemma 
2, that it is nearly a corollary of Lemma 2. The interesting features of Theorem 3 are that long-time invariance 
is shown without the traditional transformation of variables, while v is required to be Diophantine only to low 
order R E . 



1G 



Averaging for Maps 

4.3 Proofs of Propositions A, B, and C 

For the statements of Propositions A, B, and C, see Subsection 2.4. 

4.3.1 Proof of Proposition A 

The zone functions enter the proofs of Theorems 2 and 3 only through Lemma 2. It is clear that if <fi is 
replaced by s x (f> in Eq. (4.5), then the final estimate of Lemma 2 is changed to £~ A (Ci+A MC2). The error bound 
in Theorem 2 then changes to \x N -y N \ < £^(^1 +C , 2 T||7|| R£! )e LT + £Ce LT = 0(e 1_A ) for < N < T/e, while 
the error bound in Theorem 3 changes to \x N - x \ < e 1_A Ci + eN(C 2 e 1 - x \\ f\\s + (s) < K x e x ^ x + K 2 e 2 ~ x N. 
II 

4.3.2 Proof of Proposition B 

Here we give the proof of Proposition B as it applies to Theorem 1 only; the proofs of its applicability to 
Theorems 2 and 3 are nearly the same. 

Fix e > 0, let U C R d be open, take S' = U x N, and suppose g : S' — > R d , where g is not assumed to 
have compact support in U (in other words, g satisfies assumptions (i) and (ii) of §2.1, with S' in place of S, 
but does not satisfy assumption (hi)). 

We now use g to define the systems (1'), (2'), and (3'), which arc simply the previous systems (1.1), (1-2), 
and (1.3), respectively, in which the perturbation ef has been replaced by eg. We assume that the common 
initial condition xq = j/o = 2/(0) is fixed in U, and we choose the positive timescale parameter T < (3(xo), where 
[0,(3(xo)) is the maximal forward interval of existence for the initial value problem 

%=9{y), y(0) = x o e[/, (3") 

which is simply the scaled, e-independent version of system (3') obtained by introducing the "slow time" t' = et. 
We then let Z = {z E U\z = y(t'), < t' < T} denote the solution curve of system (3") over [0, T], and we 
choose S > such that S < dist(Z,dU). Then the closure D(S) of the open "5-tube" D(S) around Z formed 
by the union of open balls of radius 5 having centers in Z is contained in U ; i.e., D{5) := Ute[o t] Bs(y(t)) C 
D(8) C U, where B$(y) denotes the open ball of radius 5 centered on y in R d . 

We next choose r > so that the open ball B r (xo) contains D(8), and we define the compactly supported 
function / : R d x N — > R d which (a) coincides with g on D(8) x N, (b) vanishes on B r (x ) c x N (here c 
denotes "complement"), and (c) interpolates g on B r (x )r\D(8) c in such a way that / is of the same smoothness 
class as g and such that H/Hr^xn — IIs , IIb(5)xN- ^ ne existence of such / is guaranteed by the "smooth Tietze 
extension theorem" as given, for example, on p. 380 of [AMR]. 

Using this /, and the constant T (from the existence interval < t' < T of the solution y(t') of (3"), 
corresponding to the existence interval < t < T/e for the solution y(t) = y(et) of (3')), we apply Theorem 1 
and Lemma 4 from the Appendix to conclude that, for appropriate C\, C2 > 0, we have: 

\ x n — y-n\<Cie for 0<n<T/e, y n ED(S/2), and x n GD(5); and 

\y n - y{en)\ <C 2 e for < n < T/e and y n e D(S/2) . 

Using these inequalities together with the triangle inequality, if we now impose a smallness condition on e by 
requiring it to be strictly less than the threshold £0 := min{(5/(2Ci), <5/(2C2)}, we find that the conditions 
y n € D{5/2) and x n £ D(S) are ensured for < n < T/e, and it follows that \x n — y(en)\ < (C\ + C 2 )e < S also 
holds for < n < T/e. Finally, since x n , y n , and y{n) = y(en) remain in D(S) for < n < T/e, and since / and 
g coincide on D(S), we see that whenever < e < eo, the dynamics of systems (1'), (2'), and (3') coincide with 
the dynamics of the respective systems (1.1), (1-2), and (1.3) on the interval < n < T/e, which completes the 
proof. // 

Remark 4.11 In the above proof, the order constants C\, C 2 and the threshold £0 depend on <5. We note 
that, since the motions of systems (1.1), (1-2), and (1.3) remain in the (5-tube D(S), the uniform norms which 
appear in the proofs of Theorem 1, 2, and 3 may be taken over D(6) rather than all of R d . 
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4.3.3 Proof of Proposition C 

Let g(u,n) := (f(x,nq/p + t), a) T where u := (x,t) t , so that g : U x R x N — > R d+1 . The system 
u n+ \ = u n + eg(u n ,n) clearly satisfies the hypotheses of Proposition B applied to Theorem 1, with d replaced 
by d+ 1 and U replaced by U x R. Thus the conclusion of Theorem 1 applies to u n as well as to x n . The constants 
c and d may be easily estimated along the lines of the proofs of Theorem 1 and Lemma 4 respectively. Taking 
into account Remark 4.11, we find c(T, \a\) = (L g T\\g\\^+ llffll^ x]Nr )e Zj9 ' r and c'(T, \a\) = TL g \\g\\- B e L ° T . Here 
L g is the it-Lipschitz constant of g, which is independent of a. On the other hand, the norms \\g\\-p and ||<?||"d x n 

depend on \a\, since \\g\\^ = sup^ \J\f{v)\ 2 + a 2 and ||<jr|| 5xN = sup ue ^ neN yj \f(v) - f(x, nq/p + r)| 2 + a 2 . 
II 

Appendix. 

In this appendix, for the sake of completeness we supply statements and proofs of two elementary results with 
which the reader may be unfamiliar. 

Lemma 3 (The Gronwall inequality for sequences). Let A > 0, B > 0, and {E n }^ =0 be a sequence of 

N-l 

nonnegative real numbers with E = satisfying En < A ^ E n + B. Then En < Be AN . 

Proof. Set i?Ar_! = AJ2n=o E n + B so that R N -R N _! = AE N < ARn-i => R N < (1 + A)R N _ 1 . Proceeding 
inductively, we find that R N < (1 + A)R N _ 1 <...<(! + A) N R = B(l + A) N < Be AN , where we have used 
fl o =Bands>0^(l+s) 1 / I <e. // 

Lemma 4 (Equivalence of autonomous flows and maps). Let e > 0, and suppose f : H d — > R d is 
Lipschitz continuous and has compact support. Then 

the map y n+1 = y n + ef(y n ) (1.5) and the Bow — = ef(y) (1.6) 

are equivalent in the sense that there exists a constant K > such that the solutions y n and y(t) of (1.5) and 
(1.6), respectively, with common initial condition y = y(0) e R d satisfy the nearness condition \y n — y(n)\ < 
Ke for < n < T/e. 

Proof. Let L > denote the global Lipschitz constant of /. First we note that y(n + 1) — y(n) = 
e_C +1 7(y(t))dt z e7( y (n))+e^ +1 (J(y(t))-7(y(n)))dt. Thus y n+1 - y(n + 1) = y n - y(n) + e{J(y n ) - 
f(y(n)))-e£ +1 (f(y{t))-f(y(n))) dt. Now setting E n = \ y n -y(n)\, we obtain E n+1 < E n +eLE n +e 2 L\\f\\ R d, 
since e\ J™ + (f(y(t)) — f(y(n))) dt\ < sL \y(t) — y(ri)\dt < £ 2 L||/|| R d. Using this last inequality to form a 

telescoping sum, we arrive to E n - E < eL Y^k=o ^ k + ne2 -^ll/llR d : or &n < sL ^22=0 ^ k + e ^-^ll/llR d (since 
Eq = and < n < T/e). Finally, we apply the Gronwall inequality for sequences (Lemma 3, above) to get 
E n < eTL||7|| R£i e eLn < eTL||7|| R d e LT , so the desired conclusion is true with K = Ti||/|| Rti e LT . // 
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